7 research outputs found
Monocular Vision based Crowdsourced 3D Traffic Sign Positioning with Unknown Camera Intrinsics and Distortion Coefficients
Autonomous vehicles and driver assistance systems utilize maps of 3D semantic
landmarks for improved decision making. However, scaling the mapping process as
well as regularly updating such maps come with a huge cost. Crowdsourced
mapping of these landmarks such as traffic sign positions provides an appealing
alternative. The state-of-the-art approaches to crowdsourced mapping use ground
truth camera parameters, which may not always be known or may change over time.
In this work, we demonstrate an approach to computing 3D traffic sign positions
without knowing the camera focal lengths, principal point, and distortion
coefficients a priori. We validate our proposed approach on a public dataset of
traffic signs in KITTI. Using only a monocular color camera and GPS, we achieve
an average single journey relative and absolute positioning accuracy of 0.26 m
and 1.38 m, respectively.Comment: Accepted at 2020 IEEE 23rd International Conference on Intelligent
Transportation Systems (ITSC
Adversarial Attacks on Monocular Pose Estimation
Advances in deep learning have resulted in steady progress in computer vision
with improved accuracy on tasks such as object detection and semantic
segmentation. Nevertheless, deep neural networks are vulnerable to adversarial
attacks, thus presenting a challenge in reliable deployment. Two of the
prominent tasks in 3D scene-understanding for robotics and advanced drive
assistance systems are monocular depth and pose estimation, often learned
together in an unsupervised manner. While studies evaluating the impact of
adversarial attacks on monocular depth estimation exist, a systematic
demonstration and analysis of adversarial perturbations against pose estimation
are lacking. We show how additive imperceptible perturbations can not only
change predictions to increase the trajectory drift but also catastrophically
alter its geometry. We also study the relation between adversarial
perturbations targeting monocular depth and pose estimation networks, as well
as the transferability of perturbations to other networks with different
architectures and losses. Our experiments show how the generated perturbations
lead to notable errors in relative rotation and translation predictions and
elucidate vulnerabilities of the networks.Comment: Accepted at the 2022 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2022
Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation
Dense depth estimation is essential to scene-understanding for autonomous
driving. However, recent self-supervised approaches on monocular videos suffer
from scale-inconsistency across long sequences. Utilizing data from the
ubiquitously copresent global positioning systems (GPS), we tackle this
challenge by proposing a dynamically-weighted GPS-to-Scale (g2s) loss to
complement the appearance-based losses. We emphasize that the GPS is needed
only during the multimodal training, and not at inference. The relative
distance between frames captured through the GPS provides a scale signal that
is independent of the camera setup and scene distribution, resulting in richer
learned feature representations. Through extensive evaluation on multiple
datasets, we demonstrate scale-consistent and -aware depth estimation during
inference, improving the performance even when training with low-frequency GPS
data.Comment: Accepted at 2021 IEEE International Conference on Robotics and
Automation (ICRA
Crowdsourced 3D Mapping: A Combined Multi-View Geometry and Self-Supervised Learning Approach
The ability to efficiently utilize crowdsourced visual data carries immense
potential for the domains of large scale dynamic mapping and autonomous
driving. However, state-of-the-art methods for crowdsourced 3D mapping assume
prior knowledge of camera intrinsics. In this work, we propose a framework that
estimates the 3D positions of semantically meaningful landmarks such as traffic
signs without assuming known camera intrinsics, using only monocular color
camera and GPS. We utilize multi-view geometry as well as deep learning based
self-calibration, depth, and ego-motion estimation for traffic sign
positioning, and show that combining their strengths is important for
increasing the map coverage. To facilitate research on this task, we construct
and make available a KITTI based 3D traffic sign ground truth positioning
dataset. Using our proposed framework, we achieve an average single-journey
relative and absolute positioning accuracy of 39cm and 1.26m respectively, on
this dataset.Comment: Accepted at 2020 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS
Practical Auto-Calibration for Spatial Scene-Understanding from Crowdsourced Dashcamera Videos
Spatial scene-understanding, including dense depth and ego-motion estimation,
is an important problem in computer vision for autonomous vehicles and advanced
driver assistance systems. Thus, it is beneficial to design perception modules
that can utilize crowdsourced videos collected from arbitrary vehicular onboard
or dashboard cameras. However, the intrinsic parameters corresponding to such
cameras are often unknown or change over time. Typical manual calibration
approaches require objects such as a chessboard or additional scene-specific
information. On the other hand, automatic camera calibration does not have such
requirements. Yet, the automatic calibration of dashboard cameras is
challenging as forward and planar navigation results in critical motion
sequences with reconstruction ambiguities. Structure reconstruction of complete
visual-sequences that may contain tens of thousands of images is also
computationally untenable. Here, we propose a system for practical monocular
onboard camera auto-calibration from crowdsourced videos. We show the
effectiveness of our proposed system on the KITTI raw, Oxford RobotCar, and the
crowdsourced D-City datasets in varying conditions. Finally, we demonstrate
its application for accurate monocular dense depth and ego-motion estimation on
uncalibrated videos.Comment: Accepted at 16th International Conference on Computer Vision Theory
and Applications (VISAP, 2021
AI-Driven Road Maintenance Inspection v2: Reducing Data Dependency & Quantifying Road Damage
Road infrastructure maintenance inspection is typically a labor-intensive and
critical task to ensure the safety of all road users. Existing state-of-the-art
techniques in Artificial Intelligence (AI) for object detection and
segmentation help automate a huge chunk of this task given adequate annotated
data. However, annotating videos from scratch is cost-prohibitive. For
instance, it can take an annotator several days to annotate a 5-minute video
recorded at 30 FPS. Hence, we propose an automated labelling pipeline by
leveraging techniques like few-shot learning and out-of-distribution detection
to generate labels for road damage detection. In addition, our pipeline
includes a risk factor assessment for each damage by instance quantification to
prioritize locations for repairs which can lead to optimal deployment of road
maintenance machinery. We show that the AI models trained with these techniques
can not only generalize better to unseen real-world data with reduced
requirement for human annotation but also provide an estimate of maintenance
urgency, thereby leading to safer roads.Comment: Accepted at IRF Global R2T Conference & Exhibition 202
Robot Placement for Mobile Manipulation in Domestic Environments
The development of domestic mobile manipulators for unconstrained environments has driven significant research recently. Robot Care Systems has been pioneering in developing a prototype of a mobile manipulator for elderly care. It has a 6 degrees of freedom robotic arm mounted on their flagship robot LEA, a non-holonomic differential drive platform. In order to utilize the navigation and manipulation capabilities of such mobile manipulators, robot placement algorithm that computes a favorable position and orientation of the mobile base is sought, which enables the end effector to reach a desired target. None of the existing approaches perform robot placement while ensuring a high chance of successful planning to target through a short path, while accounting for sensing and actuation errors typical in real world scenarios. This thesis presents a novel robot placement algorithm DeCOWA (Determining Commutation configuration using Optimization and Workspace Analysis) with these characteristics. Since the approach to robot placement is dependent upon the kind of mobile manipulation, a comparative study of sequential and full body methods is performed with respect to criteria important in domestic settings. Sequential mobile manipulation is found to be most suitable, for which a modular mobile manipulation framework encompassing motion planning and robot placement is presented. With sequential mobile manipulation, the ability to successfully reach a target depends upon the kinematic capabilities of the arm. Accordingly, robot placement with DeCOWA determines a favorable location for the arm, and corresponding platform orientation. To find the position of arm’s base, an offline manipulator workspace analysis is performed generating the Inverse Reachability and Planability maps. During online use, these maps are combined into an Inverse Fusion Map that ranks differentlocations based on the ability of the arm placed there to find a successful and short motion plan to target. This map is filtered to generate a set of feasible locations at the arm’s height. Through a ranked iterative search, a suitable collision free arm location is determined followed by minimization of the platform distance from robot’s current pose. This approach is evaluated against an unbiased random placement of robot near the target using a sample set of twenty scenes mimicking domestic settings. It is found that DeCOWA is able to generate commutation configurations in fraction of a second, that lead to a high planning success rate, a short path length, and account for goal tolerance of navigation. Also, its modularity allows to use several planability metrics, making it useful for domestic application.Mechanical Engineering | Biomechanical Design - BioRobotic